[SOUND] This lecture is about
the Text-Based Prediction.
In this lecture, we're going to
start talking about the mining
a different kind of knowledge,
as you can see here on this slide.
Namely we're going to use text
data to infer values of some other
variables in the real world that may
not be directly related to the text.
Or only remotely related to text data.
So this is very different
from content analysis or
topic mining where we directly
characterize the content of text.
It's also different from opinion mining or
sentiment analysis,
which still have to do is
characterizing mostly the content.
Only that we focus more
on the subject of content
which reflects what we know
about the opinion holder.
But this only provides limited
review of what we can predict.
In this lecture and the following
lectures, we're going to talk more about
how we can predict more
Information about the world.
How can we get the sophisticated patterns
of text together with other kind of data?
It would be useful first to take a look
at the big picture of prediction, and
data mining in general, and
I call this data mining loop.
So the picture that you are seeing right
now is that there are multiple sensors,
including human sensors,
to report what we have seen in
the real world in the form of data.
Of course the data in the form
of non-text data, and text data.
And our goal is to see if we
can predict some values of
important real world
variables that matter to us.
For example, someone's house condition,
or the weather, or etc.
And so these variables would be important
because we might want to act on that.
We might want to make
decisions based on that.
So how can we get from the data
to these predicted values?
Well in general we'll first have to do
data mining and analysis of the data.
Because we, in general, should treat
all the data that we collected
in such a prediction problem set up.
We are very much interested in
joint mining of non-text and
text data, which should
combine all the data together.
And then, through analysis,
generally there
are multiple predictors of this
interesting variable to us.
And we call these features.
And these features can then be
put into a predictive model,
to actually predict the value
of any interesting variable.
So this then allows us
to change the world.
And so
this basically is the general process for
making a prediction based on data,
including the test data.
Now it's important to emphasize
that a human actually
plays a very important
role in this process.
Especially because of
the involvement of text data.
So human first would be involved
in the mining of the data.
It would control the generation
of these features.
And it would also help us
understand the text data,
because text data are created
to be consumed by humans.
Humans are the best in consuming or
interpreting text data.
But when there are, of course, a lot of
text data then machines have to help and
that's why we need to do text data mining.
Sometimes machines can see patterns in
a lot of data that humans may not see.
But in general human would
play an important role in
analyzing some text data, or applications.
Next, human also must be involved
in predictive model building and
adjusting or testing.
So in particular, we will have a lot
of domain knowledge about the problem
of prediction that we can build
into this predictive model.
And then next, of course, when we have
predictive values for the variables,
then humans would be involved in
taking actions to change a word or
make decisions based on
these particular values.
And finally it's interesting
that a human could be involved
in controlling the sensors.
And this is so that we can
adjust to the sensors to collect
the most useful data for prediction.
So that's why I call
this data mining loop.
Because as we perturb the sensors,
it'll collect the new data and
more useful data then we will
obtain more data for prediction.
And this data generally will help
us improve the predicting accuracy.
And in this loop,
humans will recognize what additional
data will need to be collected.
And machines, of course,
help humans identify what data
should be collected next.
In general, we want to collect data
that is most useful for learning.
And there was actually a subarea in
machine learning called active learning
that has to do with this.
How do you identify data
points that would be most helpful
in machine learning programs?
If you can label them, right?
So, in general,
you can see there is a loop here from
data acquisition to data analysis.
Or data mining to prediction of values.
And to take actions to change the word,
and then observe what happens.
And then you can then
decide what additional data
have to be collected by
adjusting the sensors.
Or from the prediction arrows,
you can also note what additional data
we need to acquire in order to
improve the accuracy of prediction.
And this big picture is
actually very general and
it's reflecting a lot of important
applications of big data.
So, it's useful to keep that in mind
while we are looking at some text
mining techniques.
So from text mining perspective and
we're interested in text based prediction.
Of course, sometimes texts
alone can make predictions.
And this is most useful for
prediction about human behavior or
human preferences or opinions.
But in general text data will be
put together as non-text data.
So the interesting questions
here would be, first,
how can we design effective predictors?
And how do we generate such
effective predictors from text?
And this question has been addressed to
some extent in some previous lectures
where we talked about what kind of
features we can design for text data.
And it has also been
addressed to some extent by
talking about the other knowledge
that we can mine from text.
So, for example, topic mining can be very
useful to generate the patterns or topic
based indicators or predictors that can
be further fed into a predictive model.
So topics can be intermediate
recognition of text.
That would allow us to do
design high level features or
predictors that are useful for
prediction of some other variable.
It may be also generated from original
text data, it provides a much better
implementation of the problem and
it serves as more effective predictors.
And similarly similar analysis can
lead to such predictors, as well.
So, those other data mining or
text mining algorithms can be
used to generate predictors.
The other question is, how can we join
the mine text and non-text data together?
Now, this is a question that
we have not addressed yet.
So, in this lecture,
and in the following lectures,
we're going to address this problem.
Because this is where we can generate much
more enriched features for prediction.
And allows us to review a lot of
interesting knowledge about the world.
These patterns that
are generated from text and
non-text data themselves can sometimes,
already be useful for prediction.
But, when they are put together
with many other predictors
they can really help
improving the prediction.
Basically, you can see text-based
prediction can actually serve as a unified
framework to combine many text mining and
analysis techniques.
Including topic mining and any content
mining techniques or segment analysis.
The goal here is mainly to evoke
values of real-world variables.
But in order to achieve the goal
we can do some other preparations.
And these are subtasks.
So one subtask could mine the content
of text data, like topic mining.
And the other could be to mine
knowledge about the observer.
So sentiment analysis, opinion.
And both can help provide predictors for
the prediction problem.
And of course we can also add non-text
data directly to the predicted model, but
then non-text data also helps
provide a context for text analyst.
And that further improves the topic
mining and the opinion analysis.
And such improvement often leads to more
effective predictors for our problems.
It would enlarge the space of patterns
of opinions of topics that we can
mine from text and
that we'll discuss more later.
So the joint analysis of text and
non-text data can be actually
understood from two perspectives.
One perspective,
we have non-text can help with testimony.
Because non-text data can
provide a context for
mining text data provide a way to
partition data in different ways.
And this leads to a number of type of
techniques for contextual types of mining.
And that's the mine text in
the context defined by non-text data.
And you see this reference here, for
a large body of work, in this direction.
And I will need to highlight some of them,
in the next lectures.
Now, the other perspective is text data
can help with non-text
data mining as well.
And this is because text
data can help interpret
patterns discovered from non-text data.
Let's say you discover some frequent
patterns from non-text data.
Now we can use the text data
associated with instances
where the pattern occurs as well as
text data that is associated with
instances where the pattern
doesn't look up.
And this gives us two sets of text data.
And then we can see what's the difference.
And this difference in text data is
interpretable because text content is
easy to digest.
And that difference might
suggest some meaning for
this pattern that we
found from non-text data.
So, it helps interpret such patterns.
And this technique is
called pattern annotation.
And you can see this reference
listed here for more detail.
So here are the references
that I just mentioned.
The first is reference for
pattern annotation.
The second is, Qiaozhu Mei's
dissertation on contextual text mining.
It contains a large body of work on
contextual text mining techniques.
[MUSIC]

